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Abstract 

Reservoir computing is a recently introduced, highly efficient bio-inspired approach for processing 
time dependent data. The basic scheme of reservoir computing consists of a non linear recurrent 
dynamical system coupled to a single input layer and a single output layer. Within these constraints 
many implementations are possible. Here we report an opto-electronic implementation of reservoir 
computing based on a recently proposed architecture consisting of a single non linear node and a 
delay line. Our implementation is sufficiently fast for real time information processing. We illustrate 
its performance on tasks of practical importance such as nonlinear channel equalization and speech 
recognition, and obtain results comparable to state of the art digital implementations. 
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I. INTRODUCTION 



The remarkable speed and multiplexing capability of optics makes it very attractive for 
information processing. These features have enabled the telecommunications revolution of 
the past decades. However, so far they have not been exploited insomuch as computation 
is concerned. The reason is that optical nonlinearities are very difficult to harness: it re- 
mains challenging to just demonstrate optical logic gates, let alone compete with digital 
electronics [1]. This suggests that a much more flexible approach is called for, which would 
exploit as much as possible the strenghts of optics without trying to mimic digital electron- 
ics. Reservoir computing PHIU). a recently introduced, bio-inspired approach to artificial 
intelligence, may provide such an opportunity. 

Here we report the first experimental reservoir computer based on an opto-electronic 
architecture. As nonlinear element we exploit the sine nonlinear ity of an integrated Mach- 
Zehnder intensity modulator (a well known, off-the-shelf component in the telecommuni- 
cations industry), and to store the internal states of the reservoir computer we use a fiber 
optics spool. We report results comparable to state of the art digital implementations for 
two tasks of practical importance: nonlinear channel equalization and speech recognition. 

Reservoir computing, which is at the heart of the present work, is a highly successful 
method for processing time dependent information. It provides state of the art performance 
for tasks such as time series prediction [4j (and notably won a financial time series prediction 
competition[TT]), nonlinear channel equalization or speech recognition [T^14| . For some 
of these tasks reservoir computing is in fact the most powerful approach known at present. 

The central part of a reservoir computer is a nonlinear recurrent dynamical system that 
is driven by one or multiple input signals. The key insight behind reservoir computing is 
that the reservoir's response to the input signal, i.e., the way the internal variables depend 
on present and past inputs, is a form of computation. Experience shows that in many cases 
the computation carried out by reservoirs, even randomly chosen ones, can be extremely 
powerful. The reservoir should have a large number of internal (state) variables. The exact 
structure of the reservoir is not essential: for instance, in some works the reservoir closely 
mimics the interconnections and dynamics of biological neurons in a brain [6], but many 
other architectures are possible. 

To achieve useful computation on time dependent input signals, a good reservoir should 
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be able to compute a large number of different functions of its inputs. That is, the reservoir 
should be sufficiently high-dimensional, and its responses should not only depend on present 
inputs but also on inputs up to some finite time in the past. To achieve this, the reservoir 
must operate at the threshold of dynamical instability which ensures that the system exhibits 
a "fading memory", meaning that it will gradually forget previous inputs. 

Reservoir computing is a versatile and flexible concept. This follows from two key points: 

1) many of the details of the nonlinear reservoir itself are unimportant except for the dy- 
namic regime (threshold of instability) which can be tuned by some global parameters; and 

2) the only part of the system that is trained is a linear output layer. Because of this flexi- 
bility, reservoir computing is amenable to a large number of experimental implementations. 
Thus proof of principle demonstrations have been realized in a bucket of water [15j and using 
an analog VLSI chip [16], and arrays of semiconductor amplifiers have been considered in 
simulation [17]. However, it is only very recently that an analog implementation with per- 
formance comparable to digital implementations has been reported: namely, the electronic 
implementation presented in |18] . 

Our experiment is based on a similar architecture as that of [18j, namely a single non 
linear node and a delay line. The main differences are the type of non linearity and the 
desynchronisation of the input with respect to the period of the delay line. These differ- 
ences highlight the flexibility of the concept. The performance of our experiment on two 
benchmark tasks, isolated digit recognition and non linear channel equalization, is compara- 
ble to state of the art digital implementations of reservoir computing. Compared to|18|. our 
experiment is almost 6 orders of magnitude faster, and a further 2-3 orders of magnitude 
speed increase should be possible with only small changes to the system. 

The flexibility of reservoir computing and its success on hard classification tasks makes 
it a promising route for realizing computation in physical systems other than digital elec- 
tronics. In particular it may provide innovative solutions for ultra fast or ultra low power 
computation. In Supplementary Material 1 we describe reservoir computing in more detail 
and provide a road map for building high performance analog reservoir computers. In the 
discussion we discuss how to measure the speed of experimental reservoir computers. 
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II. RESULTS 



A. Principles of Reservoir Computing 

Before introducing our implementation, we recall a few key features of reservoir comput- 
ing; for a more detailed treatment of the underlying theory, we refer the reader to Supple- 
mentary Material 1. 

As is traditional in the literature, we will consider tasks that are defined in discrete 
time, e.g., using sampled signals. We denote by u{n) the input signal, where n gZ is the 
discretized time; by x{n) the internal states of the system used as reservoir; and by y{n) the 
output of the reservoir. A typical evoultion law for x{n) is x{n + 1) = f{Ax{n) + fnu{n)), 
where / is a nonlinear function, A is the time independent connection matrix and m is the 
time independent input mask. Note that in our work we will use a slightly different form 
for the evolution law, as explained below. 

In order to perform the computation one needs a readout mechanism. To this end we 
define a subset Xj(n), < i < — 1 (also in discrete time) of the internal states of the 
reservoir. It is these states which are observed and used to build the output. The time 
dependent output is obtained in an output layer by taking a linear combination of the 
internal states of the reservoir y{n) = Xlilo^ WiXi{n). The readout weights Wi are chosen to 
minimize the Mean Square Error (MSE) between the estimator y{n) and a target function 
y{n): 



over a set of examples (the training set). Because the MSE is a quadratic function of the 
Wi the optimal weights can be easily computed from the knowledge of Xi{n) and y{n). In a 
typical run, the quality of the reservoir is then evaluated on a second set of examples (the 
test set). After training, the Wi are kept fixed. 

B. Principles of our implementation 

In the present work we use an architecture related to that used in [18] and to the minimum 
complexity networks studied in [19]. As in [18J, the reservoir is based on a non-linear 
system with delayed feedback (a class of systems widely studied in the nonlinear dynamics 
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community, see e.g. |20]) and consists of a single nonlinear node and a delay loop. The 
information about the previous internal state of the reservoir up to some time T in the past 
is stored in the delay loop. After a period T of the loop, the entire internal state has been 
updated (processed) by the nonlinear node. In contrast to the work described in [18j, the 
nonlinear node in our implementation is essentially instantaneous. Hence, in the absence of 
input, the dynamics of our system can be approximated by the simple recursion 

x{t) = sin {a-x{t-T) + (p) (2) 

where a (the feedback gain) and (f (the bias) are adjustable parameters and we have explicitly 
written the sine nonlinearity used in our implementation. 

We will use this system to perform useful computation on input signals u{n) evolving 
in discrete time n G Z. As the system itself operates in continuous time, we need to define 
ways to convert input signal(s) to continuous time and to convert the system's state back 
to discrete time. The first is achieved by using a sample and hold procedure. We obtain 
a piecewise constant function u{t) of the continuous variable t : u{t) = u{n), nT' < t < 
{n + 1)T'. The time T' < T is taken to be less than or equal to the period T of the delay 
loop; when T' 7^ T we are in the unsynchronised regime (see below). To discretize the 
system's state, we note that the delay line acts as a memory, storing the delayed states of 
the nonlinearity. From this large-dimensional state space, we take samples by dividing the 
input period T' into A^ segments, each of duration 9 and sampling the state of the delay line 
at a single point with periodicity 9. This provides us with A^ snapshots of the nonlinearity's 
reponse to each input sample u{n). From these snapshots, we construct A^ discrete-time 
sequences Xj(n) = x{nT' + (i + 1/2)9) {i = 0, 1, ...A^ — 1) to be used as reservoir states from 
which the required (discrete-time) output is to be constructed. 

Without further measures, all such recorded reservoir states would be identical, so for 
computational purposes our system is one- dimensional. In order to use this system as a 
reservoir computer, we need to drive it in such a way that the Xi{n) represent a rich variety 
of functions of the input history. It is often helpful |9l [19] to use an "input mask" that 
breaks the symmetry of the system. In [18j good performance was improved by using a 
nonlinear node with an intrinsic time scale longer than the time scale of the input mask. In 
the present work we also use the "input mask", but as our nonlinearity is instantaneous, we 
cannot exploit its intrinsic time scale. We instead chose to desynchronize the input and the 
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reservoir, that is, we hold the input for a time T' which differs shghtly from the period T of 
the delay loop. This allows us to use each reservoir state at time n for the generation of a 
new different state at time n + 1 (unlike the solution used in |18] where the intrinsic time 
scale of the nonlinear node makes the successive states highly correlated). We now explain 
these important notions in more detail. 

The input mask m{t) = m{t + T') is a periodic function of period T'. It is piecewise 
constant over intervals of length 6, i.e., m(t) = rrij when nT' + jO < t < riT' + (j + 1)6*, for 
j = 0,l,...,A^ — 1. The values rrij of the mask are randomly chosen from some probability 
distribution. The reservoir is driven by the product v{t) = (5m{t)u{t) of the input and the 
mask, with (5 an adjustable parameter (the input gain). The dynamics of the driven system 
can thus be approximated by 

x{t) = sin {ax{t - T) + (3m{t)u{t) + (^) (3) 
It follows that the reservoir states can be approximated by 

Xi{n) = sin {aXi{n — 1) + (3miu{n) + ip) (4) 
when T' = T (the synchronized regime) or more generally as 

{sin [aXi-k {n — \) + f3miU [n) + v^) k < i < N 
(5) 
sin {axN+i-k {n — 2) + jSmiU (n) + ip) < i < k 

when T' = jiJ^T, [k E {!,..., — 1}) (the unsynchronized regime). In the synchronized 
regime, the reservoir states correspond to the responses of uncoupled discrete-time dy- 
namical systems which are similar, but slightly different through the randomly chosen mj. 
In the unsynchronized regime, with a desynchronization T — T' = k6 , the state equations 
become coupled, yielding a much richer dynamics. With an instantaneous nonlinearity, 
desynchronisation is necessary to obtain a set of state transformations that is useful for 
reservoir computing. We beleive that it will also be useful when the non linearity has an 
intrinsic time scale, as it provides a very simple way to enrich the dynamics. 

In summary, by using an input mask, combined with desynchronization of the input and 
the feedback delay, we have turned a system with a one-dimensional information represen- 
tation into an N-dimensional system. 
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C. Hardware setup 



The above architecture is implemented in the experiment depicted in Fig. [TJ The sine 
nonhnearity is implemented by a voltage driven intensity modulator (Lithium Niobate Mach 
Zehnder interferometer), placed at the output of a continuous light source, and the delay 
loop is a fiber spool. A photodiode converts the light intensity I{t) at the output of the fiber 
spool into a voltage; this is mixed with an input voltage generated by a function generator 
and proportional to m{t)u{t), amplified, and then used to drive the intensity modulator. 
The feedback gain a is set by adjusting the average intensity Iq of the light inside the fiber 
loop with an optical attenuator. By changing a we can bring the system to the edge of 
dynamical instability. The nonlinear dynamics of this system have already been extensively 
studied, see |2T1 - [23] . The dynamical variable x{t) is obtained by rescaling the light intensity 
to lie in the interval [—1,+!] through x{t) = 2/(t)//o — 1. Then, neglecting the effect of 
the bandpass filter induced by the electronic amplifiers, the dynamics of the system is given 
by eq. ([s]) where a is proportional to Iq. Equation ([s]), as well as the discretized versions 
thereof, eqs. (g and g, are derived in the supplementary material; the various stages of 
processing of the reservoir nodes and inputs are shown in Fig. |2} 

In our experiment the round trip time is T = 8.504pis and we typically use = 50 
internal nodes. The parameters a and (3 in eq. ([S]) are adjusted for optimal performance 
(their optimal value may depend on the task), while (f is usually set to 0. The intensity I{t) 
is recorded by a digitizer, and the estimator y{n) is reconstructed offline on a computer. 

We illustrate the operations of our reservoir computer in FigjS] where we consider a 
very simple signal recognition task. Here, the input to the system is taken to be a random 
concatenation of sine and square waves; the target function y{n) is for a sine wave and 1 for 
a square wave. The top panel of Fig. |3] shows the input to the reservoir: the blue line is the 
representation of the input in continuous time u{t). In the bottom panel, the output of the 
network after training is shown with red crosses, against the desired output represented by a 
blue line. The performance on this task is essentially perfect: the Normalized Mean Square 
Error NMSE = ;^ Ei=i (l/H - ^H) Vz ELi (?/(^))^i38j reaches NMSE ~ 1.5 • IQ-^, 
which is significantly better than the results reported using simulations in [T7] . 
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D. Experimental results 



We have checked the performance of this system extensively in simulations. First of 
all, if we neglect the effects of the bandpass filters, and neglect all noise introduced in our 
experiment, we obtain a discretized system described by eq. ([s]) which is similar to (but 
nevertheless distinct from) the minimum complexity reservoirs introduced in |19) . We have 
checked that this discretized version of our system has performance similar to usual reser- 
voirs on several tasks. This shows that the chosen architecture is capable of state of the art 
reservoir computing, and sets for our experimental system a performance goal. Secondly we 
have also developed a simulation code that takes into account all the noises of the experi- 
mental components, as well as the effects of the bandpass filters. These simulations are in 
very good agreement with the experimentally measured dynamics of the system. They allow 
us to efficiently explore the experimental parameter space, and to validate the experimen- 
tal results. Further details on these two simulation models are given in the supplementary 
information. 

We apply our optoelectronic reservoir to three tasks. These tasks are benchmarks which 
have been widely used in the reservoir computing community to evaluate the performance 
of reservoirs. They therefore allow comparison between our experiment and state of the art 
digital implementations of reservoir computing. 

For the first task, we train our reservoir computer to behave like a Nonlinear Auto 
Regressive Moving Average equation of order 10, driven by white noise (NARMAlO). More 
precisely, given the white noise u{n), the reservoir should produce an output y{n) which 
should be as close as possible to the response y{n) of the NARMAlO model to the same 
white noise. The task is described in detail in the methods section. The performance is 
measured by the Normalised Mean Square Error (NMSE) between output y{n) and target 
y{n). For a network of 50 nodes, both in simulations and experiment, we obtain a NMSE = 
0.168 ± 0.015. This is similar to the value obtained using digital reservoirs of the same size. 
For instance a NMSE value of 0.15 ± 0.01 is reported in pi] also for a reservoir of size 50. 

For our second task we consider a problem of practical relevance: the equalization of a 
nonlinear channel. We consider a model of a wireless communication channel in which the 
input signal d{n) travels through multiple paths to a nonlinear and noisy receiver. The task 
is to reconstruct the input d{n) from the output u{n) of the receiver. The model we use was 
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introduced in |25| and studied in the context of reservoir computing in Our results, given 
in Fig. |4| are one order of magnitude better than those obtained in [23] with a nonhnear 
adaptive filter, and comparable to those obtained in |1] with a digital reservoir. At 28 dB of 
signal to noise ratio, for example, we obtain an error rate of 1.3 ■ 10~^, while the best error 
rate obtained in [25j is 4 ■ 10~^ and in [4J error rates between 10~^ and 10~^ are reported. 

Finally we apply our reservoir to isolated spoken digits recognition using a benchmark 
task introduced in the reservoir computing community in |26]. The performance on this task 
is measured using the Word Error Rate (WER) which gives the percentage of words that 
are wrongly classified. Performances reported in the literature are a WER of 0.55% using a 
hidden Markov model |27]; WERs of 4.3% [26j, of 0.2% [I2], of 1.3% [19j for reservoir com- 
puters of different sizes and with different post processing of the output. The experimental 
reservoir presented in [18] reported a WER of 0.2%. Our experiment yields a WER of 0.4%, 
using a reservoir of 200 nodes. 

Further details on these tasks are given in the methods section and in Supplementary 
Material 2. 

III. DISCUSSION 

We have reported the first demonstration of an opto-electronic reservoir computer. Our 
experiment has performance comparable to state of the art digital implementations on bench- 
mark tasks of practical relevance such as speech recognition and channel equalization. Our 
work demonstrates the flexibility of reservoir computers that can be readily reprogrammed 
for different tasks. Indeed by re-optimising the output layer (that is, choosing new readout 
weights Wk), and by readjusting the operating point of the reservoir (changing the feedback 
gain a, the input gain /3, and possibly the bias (f) one can use the same reservoir for many 
different tasks. Using this procedure, our experimental reservoir computer has been used 
successively for tasks such as signal classification, modeling a dynamical system (NARMAlO 
task), speech recognition, and nonlinear channel equalization. 

We have introduced a new feature in the architecture, as compared to the related exper- 
iment reported in [TH]. Namely by desynchronizing the input with respect to the period of 
the reservoir we conserve the necessary coupling between the internal states, but make a 
more efficient use of the internal states as the correlations introduced by the low pass filter 
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in |18| are not necessary. 

Our experiment is also the first implementation of reservoir computing fast enough for 
real time information processing. It can be converted into a high speed reservoir computer 
simply by increasing the bandwidth of all the components (an increase of at least 2 orders of 
magnitude is possible with off-the-shelf optoelectronic components). We note that in future 
realisations it will be necessary to have an analog implementation of the pre-processing of the 
input (digitisation and multiplication by the input mask) and of the post-processing of the 
output (multiplication by output weights), rather than the digital pre- and post-processing 
used in the present work. 

From the point of view of applications, the present work thus constitutes an important 
step towards building ultra high speed optical reservoir computers. To help achieve this 
goal, in the supplementary material we present guidelines for building experimental reser- 
voir computers. Whether optical implementations can eventually compete with electronic 
implementations is an open question. From the fundamental point of view, the present work 
helps understanding what are the minimal requirements for high level analog information 
processing. 
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Methods 
NARMAIO task. 

Auto Regressive models and Moving Average models, and their generalisation Nonlinear 
Auto Regressive Moving Average Models (NARMA), are widely used to simulate time series. 
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The NARMAlO model is given by the recurrence 

y{n + 1) = 0.3y{n) + 0.05y(n) y{n - i) j + l.^u{n - ^)u{n) + 0.1 (6) 

where u{n) is a sequence of random inputs drawn from an uniform distribution over the 
interval [0,0.5]. The aim is to predict the y{n) knowing the u(n).This task was introduced 
in |28) . It has been widely used as a benchmark in the reservoir computing community, see 
for instance [IHl El |29] 

Nonlinear channel equalization. 

This task was introduced in [25], and used in the reservoir computing community in |3] 
and [24]. The input to the channel is an i.i.d. random sequence d{n) with values from 
{—3, —1, +1, +3}. The signal first goes through a linear channel, yielding 

q{n) = 0.08d{n + 2)-0.12d{n + 1) + d{n) + 0.18d{n-l) 

-0.1d{n-2) + 0.091c/(n-3)-0.05c/(n-4) (7) 
+0Md{n-5) + 0md{n-6) + 0.01rf(n-7) 

It then goes through a noisy nonlinear channel, yielding 

u{n) = q{n) + 0.036g(n)^-0.011g(n)3 + v{n) (8) 

where v{n) is an i.i.d. Gaussian noise with zero mean adjusted in power to yield signal-to- 
noise ratios ranging from 12 to 32 db. The task is, given the output u{n) of the channel, 
to reconstruct the input d{n). The performance on this task is measured using the Symbol 
Error Rate, that is the fraction of inputs d{n) that are misclassified (Ref. |24| used another 
error metric on this task). 

Isolated spoken digit recognition. 

The data for this task is taken from the NIST TI-46 corpus |30] . It consists of ten spoken 
digits (0...9), each one recorded ten times by five different female speakers. These 500 spoken 
words are sampled at 12.5 kHz. This spoken digit recording is preprocessed using the Lyon 
cochlear ear model [31| . The input to the reservoir Uj{n) consists of an 86-dimensional 
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state vector (j = 1, 86) with up to 130 time steps. The number of variables is taken to be 
N — 200. The input mask is taken to be a A?" x 86 dimensional matrix 6^ with elements taken 
from the the set {—0.1, +0.1} with equal probabihties. The product bijUj{n) of the mask 
with the input is used to drive the reservoir. Ten linear classifiers yk{n) {k = 0, ...,9) are 
trained, each one associated to one digit. The target function for yk{n) is +1 if the spoken 
digit is k, and -1 otherwise. The classifiers are averaged in time, and a winner-takes-all 
approach is applied to select the actual digit. 

Using a standard cross-vahdation procedure, the 500 spoken words are divided in five 
subsets. We trained the reservoir on four of the subsets, and then tested it on the fifth 
one. This is repeated five times, each time using a different subset as test, and the average 
performance is computed. The performance is given in terms of the Word Error Rate, that 
is the fraction of digits that are misclassified. We obtain a WER of 0.4% (which correspond 
to 2 errors in 500 recognized digits). 
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Figure 1: Scheme for the experimental set-up. The red and green parts represent respectively 
the optical and electronic components. The optical part of the setup is fiber based, and operates 
around 1550nm (standard telecommunication wavelength). "M-Z" : Lithium Niobate Mach-Zehnder 
modulator. "99": DC voltage determining the operating point of the M-Z modulator. "Combiner" 
: electronic coupler adding the feedback and input signals. "AWG": arbitrary waveform generator. 
A computer generates the input signal for a task and feeds it into the system using the arbitrary 
waveform generator. The response of the system is recorded by a digitiser and retrieved by the 
computer which optimizes the read-out function in a post processing stage. The feedback gain a is 
adjusted by changing the average intensity inside the loop with the optical attenuator. The input 
gain /3 is adjusted by changing the output voltage of the function generator by a multiplicative 
factor. The bias Lp is adjusted by using a DC voltage to change the operating point of the M-Z 
modulator. The operation of the system is fully automated and controlled by a computer using 
MATLAB scripts. 
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Figure 2: Schematic diagram of the information flow in the experiment depicted in Fig. [T] On the 
plot we have represented four reservoir nodes at different stages of processing, labelled according 
to equation |5] with k = 1. Starting from the bottom, and going clockwise, a input value u{n) gets 
multiplied by an input gain /3 and a mask value m^, then mixed with the previous node state 
axi-k{n — 1). The result goes through the sine function to give the new state of the reservoir 
Xj(n), which then gets amplified by a factor a and, after the delay, will get mixed with a new input 
u{n + 1). All the network states Xi{n) are also collected by the readout unit, multiplied by their 
respective weights Wi and added together to give the desired output y{n). 
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Figure 3: Signal classification task. The aim is to differentiate between square and sine waves. 
The top panel shows the input u{t), a stepwise constant function resulting from the discretization 
of successive step and sine functions. The bottom panel shows in red crosses the output of the 
reservoir y (n). The target function (dashed line in the lower panel) is equal to 1 when the input 
signal is a step function and to when the input signal is a sine function. The Normalized Mean 
Square Error, evaluated over 1000 inputs, is NMSE ~ 1.5 10~^. 
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Supplementary Material 1: 

An Introduction to Reservoir Computing 

for Physicists and Engineers 

Introduction 

Understanding how physical systems can process information is a major scientific chal- 
lenge. The tremendous progress that has been accomplished during the past century has 
given rise to whole new fields of science, such as computer science and digital (silicon) elec- 
tronics, or quantum information. Information processing by classical analog systems is a 
case apart because remarkable examples are found in the biological sciences (e.g. the brain), 
but our understanding is still very incomplete. Thus, we understand at the basic level many 
aspects of how information is processed using biochemical reactions in cells, or how infor- 
mation is processed by neurons in a brain, to take two examples. But how these elementary 
processes are organized at a higher level, and what is the large scale architecture of these 
systems, still escapes us. Understanding these issues would be of great conceptual interest. 
It could also have huge technological repercussions, as it could completely change the way 
we build information processing machines. That this tremendous scope for progress exists is 
illustrated by the approximately 6 orders of magnitude gap in energy consumption between 
a brain and a present day silicon computer [39j. 

So far most work on information processing in analog systems has been based on imitating 
biological systems. This has given rise to the field of artificial neural networks. More abstract 
approaches have also been developed, such as hidden Markov models, or support vector 
machines. Reservoir computing |2H8] provides an alternative line of attack. The interest of 
reservoir computing is that 1) it proposes architectures that are quite different from those 
studied up to now; and 2) despite a relatively recent history -the first papers on the topic date 
from 2002-, its performances are comparable to, and sometimes even exceed, those of other 
approaches to machine learning[40j. A deeper understanding of reservoir computing could 
provide new insights into how analog classical systems, and in particular biological systems. 
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process information. Additionally it could give rise to novel approaches for implementing 
computation and enable new applications. 

The aim of the present text is not to duplicate existing reviews of reservoir computing 
[HI [TOj. Rather, we wish to present the subject from the point of view of the physicist 
or engineer who wishes to build an experimental reservoir computer. We will argue that 
reservoir computing provides a quite detailed, but also quite flexible, road map towards 
building physical systems that process information with architectures very different from 
those used up to now. 



Understanding Reservoir Computing 

To understand the potentialities of reservoir computing, it is best to start with an exam- 
ple, and then examine to what extent the example can be generalized. Finally, on the basis 
of this discussion, we outline a road map for building experimental reservoir computers. 
The archetypal reservoir computer is constructed as follows. It consists of a finite number 
of internal variables Xi{n) {i = 0, N — 1) evolving in discrete time n G Z. The evolution 
of the variables Xi{n) is perturbed by an external input u{n). The evolution equations for 
the reservoir is: 



Xi{n + 1) = tanh 



'N-l 



aijXj{n) + hiu{n) 



1=0 



(9) 



where the (time independent) coefficients and hi are chosen independently at random from 
a simple distribution, for instance Gaussian distributions with mean zero and variances a? 
and 6^. The dynamics of this system (called the "reservoir") is fixed (i.e. the coefficients 
ttij and hi are fixed) . The dimensionality of the reservoir is typically much larger than 
the input dimensionality. Given proper parameters o? and 6^, the reservoir will also have 
memory of past inputs. The use of the tanh nonlinearity in eq. ^ is traditional, but by no 
means essential, see below. 

The aim of reservoir computing is to perform a computation on the input u{n). To this 
end one maps the internal state of the reservoir to an output signal denoted y{n). The 
output should be as close as possible to a given target function y{n). An example of this 
would be the detection of a stereotypical pattern in the input sequence such as spoken digits 
in a stream of audio, or a certain header in digital communication. Another example would 
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be trying to predict future inputs such as electrical load prediction or financial time-series 
prediction. 

The output y{n) of the reservoir computer at time n is a linear combination of the high 
dimensional internal states of the reservoir at time n: 

N-l 

y{n) = Y,WMn). (10) 

1=0 

The Wi are usualUy chosen as follows. For some time interval, n G we simulate 

the reservoir such that one knows both the inputs u{n) and the target function y{n). We 
minimize, over this training interval, the Mean Square Error between y{n) and y{n): 

1 ^ 

MSE=-5^(y(r^)-yH)^ (11) 

n=l 

This can be efficiently performed using standard linear regression techniques. [In cases where 
a classification of the inputs is desired, one may train several outputs (one for each class) 
and then use a winner takes all approach, or one may try to optimize the miss-classification 
rate using techniques such as logistic regression, Gaussian discriminant analysis, regression 
to the indicator function or a linear support vector machine.] Note that in general one 
should apply some form of regularization (to reduce the model complexity) to the readout 



function eq. (10). To this end one uses optimisation techniques that prefer models with 
small parameters Wi. This is accomplished by using e.g. ridge regression. 

The coefficients Wi are now fixed and the reservoir computer is ready for use: one can 
input into the reservoir new inputs u{n) and obtain the corresponding output y{n). Because 
of the training procedure where we ensured proper generalization, the output will continue 
to be close to the target function, even for new data. It is considered good practice to check 
this. To this end one estimates the performance of the reservoir in a test phase during which 
new inputs are fed into the reservoir, and one compares, using the previously chosen Wi and 
an appropriate error metric, how close the output is to the target function. 

Several remarks are now in order: 

1. For good performance, it is necessary to scale the coefficients by a multiplicative factor 

aij — )• aaij and bi — )■ f3bi . 

(In other words, if the coefficients are drawn from normal distributions with mean 
zero, one adjusts the variances of the normals). These parameters are often called 
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the feedback gain and input gain respectively. The necessity for this scaling can be 
understood intuitively as follows: if the coefficients Ojj are too small, then the state 
equation ^ is strongly damped; if the coefficients too large, then the system 

has highly complex dynamics or is even chaotic, and its behavior is very sensitive 
to small changes in the input. The coefficients bi are adjusted in order to have an 
appropriate ratio between the contribution of the terms Ylj ^ij^ji''^) source 
term hiu{n). They also determine how non-linearly the inputs are expanded in the 
reservoir states. 

2. Finding the optimal values of Wi is immediate in the case of linear regression. Indeed 



inserting eq. (10) into eq. (11) and extremizing with respect to VFj, one finds that the 



optimal choice is 

7V-1 

= E ^^P^ (12) 

where is the inverse of the correlation matrix R^j = Yl^=i Xi{n)xj{n) and Pj = 
^ ^1=1 This should be contrasted to what would happen if one tried to 

optimize the coefficients aij and bi. In this case there are many more coefficients to 
optimize, and furthermore it is difficult to find the optimal value, as one may very 



easily end up in local optima. The use of a linear readout eq. (10) is thus highly 
advantageous. 

3. Many variants on the above architecture have been investigated with success. We can 
list: 

(a) Using other nonlinearities than tanh, such as the biological inspired models which 
approximate the way real neurons interact with each other (spiking neuron mod- 
els). 

(b) Using sparse connection matrices a^j (i.e. matrices with most of the elements set 
to zero), so as to decrease the computational resources required to simulate the 
evolution. 

(c) Using multiple inputs Uk{n), k = 1, K. In this case one simply adds an index 
to the coefficients hi. The source term in eq. ^ thus becomes Ylk^ikUk{n). 
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(d) Using the output y{n) as input to the reservoir. This modification allows training 
the reservoir to behave as a given dynamical system [21 S] . 



4. The above approach is extremely powerful. To take two examples: 

(a) Given a time series u{n) for times n = 1, T, the task is to guess the subsequent 
values of the time series u{T + 1), u{T + 2), ... The time series could for instance 
come from the evolution of a chaotic dynamical system, such as the Mackey-Glass 



system. Using the trick |3d[ results which outperform all other known approaches 
for predicting chaotic time series were obtained [4j. Reservoir computing also 
won an international competition on prediction of future evolution of financial 
time series 

(b) Reservoir computing has been extensively applied to speech recognition. Using 
well known data sets, reservoir computing outperforms other approaches for iso- 
lated digit recognition |T2| [H]. Recently reservoir computing has been applied 
to a more complex task: phoneme recognition. Performances comparable to the 
best alternative methods were obtained using 3 coupled reservoirs, each with 
= 20000 variables 



Road map for building information processing systems 

The basic setup of reservoir computing, although typically implemented in software, 
suggests many promising new avenues to implement computation in analog dynamic systems. 
The theoretical requirements for reservoir computing to be computationally universal (in 
the analog sense) j6] are very loose: the reservoir is required to have fading memory, to 
be excitable by the input and a high dimensional readout must be possible. Many physical 
systems could be conceived that adhere to these rules and could thus potentially be turned in 
universal computing machines. However turning these general ideas into a working machine 
in more difficult. 

If one wishes to build an experimental reservoir computer, then it is essential to under- 
stand what are the constraints, but also the design freedom. In this respect, a number of 
important lessons can be learned from the example presented in the previous section. 
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1. The fact that the interconnection matrix aij and the coefficients bi in eq. (|9| are 
chosen at random is extremely important. It means that fine tuning of a large num- 
ber of coefficients is not necessary for a good reservoir computer. Rather almost all 
interconnection matrices will give good performance. 

2. The fact that the tanh in eq. ^ can be changed into other non-linearities is also 
extremely important. It means that one does not have to imitate specific non-linear 
behavior (such as the specific dynamics of neurons in a biological brain), but one can 
use the non-linearities that are easily accessible experimentally. 

3. The fact that the only coefficients that are task specific are the Wi (the weights of 
the readout function) implies that a given dynamical system can be used for many 
different information processing tasks. It also implies that one can separate the design 
and analysis of the reservoir itself from the design and analysis of the readout function. 

On the basis of these remarks, a reservoir computer can be built out of a dynamical system 
that satisfies the following constraints: 

1. It should consist of a large number (say 50, or more, but this depends on the task 
and on the specifics of the dynamic system) dynamical variables which are coupled 
together by a non-linear evolution equation. 

2. The evolution of the dynamical system can be perturbed by the external input. 

3. As much as possible, one should try to break all symmetries of the system (this is 
the message coming from the fact that the aij and bi are random: there is no residual 
structure/symmetry in the dynamics). To this end one should privilege dynamical 
systems that depend on a large number of random parameters. 

4. A few global parameters must be experimentally tunable. These global parameters are 
used to adjust the operating point of the system (typically to just below the threshold 
of instability), and to adjust the overall weight given to the external inputs (this 
corresponds to the scaling of the coefficients and bi). 

5. It should be possible to read-out the state of a large number (better all) of the dy- 
namical variables, or at least to construct the readout function y = ^ ■ WiXi with 
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adjustable weights Wj. Note that a reservoir computer can function satisfactorily even 
if only a subset of the dynamical variables are read out. 

Once these constraints are satisfied, one can proceed to test the system (either using numeri- 
cal simulations, or using the experimental realization). In the reservoir computing literature, 
there exists a series of (somewhat) standardized tests on which to evaluate the performance 
of the system. Some of these tests have a theoretical justification (e.g. linear memory ca- 
pacity); others are interesting tasks which have been often used by the community. These 
tasks thus provide benchmarks with which to compare the performances of different reservoir 
computers. Possible tasks to study include: 

1. Linear memory capacity and memory function [3j. In this task the target function 
y{n) = u{n — k) is simply the input k time steps in the past. Performance on this 
task indicates how well the reservoir's state can be used to recover past inputs. If the 
linear memory capacity is small, then the reservoir will be unable to carry out tasks 
that require long memory of the input. 

2. Non-linear memory capacity, see In this task the target function is a non- 
linear function of the past inputs, for instance the product y{n) = u{n — k)u{n — k'). 
Performance on this task measures how much non-linear processing of the input is 
carried out by the reservoir. 

3. Simulating specific Nonlinear Auto Regressive Moving Average (NARMA) systems. 
For the NARMA tasks, the aim is to use the reservoir to simulat the response of 
a NARMA system driven by a random input. Two variants are widely used: a 10th 
order system (NARMAlO) introduced in [28j, and the more difficult 30th order system 
(NARMA30) (see for instance [35j for a definition). 

4. Predicting the evolution of the trajectory of chaotic attractors such as the attractor 
of the Mackey-Glass system 

5. Speech recognition: Several benchmarks have been published, going from Japanese 
vowel recognition |14j, to isolated spoken digits p^, to phoneme recognition |13] . 

This (non-exhaustive) list of benchmarks provides a natural road-map for future experi- 
mental reservoirs. First master easy tasks such as signal classification, or isolated digit 
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recognition. Then go on to moderately harder tasks such as NARMAlO or NARMA30. Fi- 
nally, one can imagine tackling hard tasks such as phoneme recognition. In all cases, measure 
the linear and nonlinear memory functions, as they will give task independent information 
on how the reservoir is processing information. 

This road-map is appealing. Steps along it have been followed using reservoirs based on 
water waves in a bucket |15] . on the cerebral cortex of a cat [36], on analog VLSI chips 
|16| . using a (numerically simulated) array of coupled semi-conductor optical amplifiers |17] . 
However it is only very recently that it has been possible, using an electronics implementation 
and a delayed feedback loop, to demonstrate an analog reservoir computer with performance 
comparable to that of digital reservoirs on a non trivial task (in this case isolated digit 
recognition) [18j. 



Open questions and perspectives 

The above road map leaves open a number of important questions. We outline the most 
obvious: 

1. Effect of noise. In experimental realizations of reservoir computing, there will in- 
evitably be noise and imperfections. How will these affect the performance of the 
scheme? There has not been an exhaustive analysis of this issue, but a few remarks 
are in order. One should distinguish between noise within the reservoir itself and noise 
in the readout. 

(a) Noise in the reservoir itself is deleterious. However, reservoir computers can 
continue to work even with moderate levels of noise. Indeed the noise can be 
viewed as an unwanted input, and the aim is to perform the desired task while 
ignoring this second unwanted input. A possible approach to counteract the effect 
of noise would simply be to increase the size of the reservoir. But this remains 
to be studied in detail. 

(b) Noise in the readout can be beneficial. Indeed adding noise to the readout is a 
trick, equivalent to ridge regularization, often used in numerical simulations to 
increase the robustness of reservoirs. 



23 



2. Best non-linearity. Experience with digital simulations of reservoirs shows that sig- 
moidal non-linearities, such as tanh, give very good performance. However in experi- 
mental realizations, other non-linearities may be much easier to implement. Experience 
suggests that the best non-linearity depends to some extent on the task at hand. Also 
sub-optimal non-linearities can presumably be compensated by increasing the size of 
the reservoir. 

3. Continuous time and continuous space. In numerical simulations, it is by far easier 
to work with a discrete set of variables Xi{n), i = 0,...,N — 1 evolving in discrete 
time n G Z. But in experimental systems, it is often much more natural to work with 
continuous variables and continuous time. A theory of reservoir computing operating 
with continuous variables evolving in continuous time remains to be written, although 
some theoretical progress has been made along these hues ^7] . 

Finally, what is the future of experimental reservoir computing? On the one hand the 
quest is fundamental: can we build analog systems that perform high-level information 
processing? Will we in this way gain new insights into how biological information processing 
systems operate? On the other hand experimental reservoir computers could ultimately have 
practical applications. Indeed they are based on architectures completely different from those 
used in present day digital electronics. Presumably the place to look for applications is at 
the frontier where digital electronics has difficulty coping, such as information processing at 
ultra high speeds, or with ultra low energy consumption, or for tasks which are hard to code 
using standard programming methods. 
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Supplementary Material 2 



Experimental System, Numerical 



Simulations, Tasks 



Here we describe in more detail our experimental system (schematized in Fig. 1 of the main 
text), the algorithm used to simulate it numerically, and the tasks we study to validate its 
performance as a reservoir. 

Undriven system 

We first consider the free running dynamics of our system, in the absence of input. 

The light source consists of a Laser at the wavelength 1564nm (in the standard C band 
of optical telecommunication). It produces a time independent intensity Iq. 

The light passes through an intensity modulator (Mach-Zehnder modulator) driven by 
a time dependent voltage V (t). The light intensity just after the modulator is I (t) = 
/qCOS^ (^tt^^ + <pj where is a constant voltage (the voltage which is needed to go from 
a maximum to the next minimum of light at the output of the modulator) and can be 
adjusted by applying a DC bias voltage to the modulator. Taking (f) — Sn / A + (p allows us 
to rewrite 



The light passes trough a tunable optical attenuator, enabling the adjustment of the 
loop's gain through the variation of the value Iq. It propagates through a long spool of 
fiber (1.7 km of single mode optical fiber in our experiment), and is then detected by a 
photodiode integrated with a transimpedance amplifier. The resulting voltage is amplified 
again (by a RF amplifier) to produce the voltage V (t) that drives the intensity modulator. 
In our experiment, the photodiode (with its transimpedance amplifier) and RF amplifier 
operate in a hnear regime, hence V (t) is proportional to I {t — T) where T is the round trip 
time of the oscillator. However, one must take into account that the photodiode acts as a 




(13) 
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lowpass filter and that the RF amphfiers act as highpass filters. Both the photodiode and 
the amplifiers also add noise. Thus, approximating the filters by first order filters, we have 
V (u) = A ^^^^j^ uii+jui ^^^'^ {j +n{uj)^ where j is the imaginary unit, tilde denotes the 
Fourier transform, A denotes the amplification factor, uji <^ Uq are the cutoff frequencies 
of the resulting bandpass filter, and n(t) is the white noise. In the time domain we have 
V{t) = AT^^^i^^ {I {t — T) + n (t)) where J-Lo^i denotes the linear bandpass filter. 

Because of the high pass filter, we can use as variable the fluctuations around the mean 
intensity. We define x (t) = ^^^\^/2^'^ (i-e., we rescale the intensity I(t) to lie in the interval 



[— 1, +1]) whereupon eq. (13) becomes 

X (t) = sin J-^o^, (^x (t - T) + + (14) 

where a = ^^^^ is experimentally adjustable by changing the input light intensity Iq. Ex- 
perimentally it can be tuned over the range a G [0,4.2]. 

Tj^ical values of the parameters are T = S.Sps, uo/2'k = 125MHz, ui/2tt = 50kHz, and 
the amplitude of n (t) //q is approximately 3.5%. At the output of the intensity modulator, 
an optical splitter enables us to take approximatively 3% of the optical signal in order to 
measure the optical intensity in the fiber by the mean of a second amplified photodiode. 
The resulting voltage is digitized with a National Instrument PXI card at the sample rate 
of 200 megasamples per second, or can be measured with a digital oscilloscope. 

If we increase Jq gradually, the intensity I (t) undergoes a bifurcation diagram typical of 
nonlinear dynamical systems. Figure |5] shows the excellent agreement between the exper- 
imentally observed bifurcation diagram and the one obtained by numerical simulations of 
the evolution equations (with the bias if = 0). In this bifurcation diagram, the number of 
bifurcations before reaching chaotic behavior is strongly affected by the amount of noise in 
the system. Comparing this bifurcation diagram to simulations is the most precise way we 
have to estimate the amount of noise in our experimental setup. The estimated value is in 
agreement with measurements carried out on each component of the setup separately. We 
also verified that the thickness of the branches inside the bifurcation diagram is mainly due 
to the noise level of oscilloscope. 
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Driven system 



We now consider the addition of an input term. The input is characterized by a new time 
scale T' < T. The scalar input u (n) (evolving in discrete time n G Z) and the input mask 
bi are transformed into a continuous input s {t) as follows: 



s (t) = hiU in) for t G 



■ ^, iV _ {i + l)T 



0,...,Ar-l , n G Z (15) 



where is the number of nodes in the reservoir. The input mask values 6j, i = 0,...,A^ — 1 
are randomly chosen from a given distribution (which may depend on the task). The time 
scale 6 over which the input s (t) changes is: 

T' 

A voltage proportional to s (t) is generated by a function generator (at a sample rate 
of 200 Msamples/s) and added to the output voltage of the amplified photodiode using a 
RF combiner placed at the entrance of the RF amplifier. So the voltage V {t) that drives 
the intensity modulator is a combination of the light intensity and the input signal. The 
dynamical equations thus become 

x{t) = sin ( aJL,,^, (x it-T) + ^ ) + (s (t)) + ^] (17) 



where /3 is experimentally adjustable by varying the output voltage amplitude of the function 
generator. In this equation, we take into account that the RF combiner is placed before the 
RF amplifier, and therefore that the source term is affected by the highpass filter J-'^^^ of the 
RF amplifier. Because of the difference between the time scales T and uji/2tt, the effect of 
the filter J^^^ is almost negligible. Its main effect is to ensure that the effective source signal 
J^Lui {s (t)) has mean value zero. 

In our experiments we take T = 8.5/iS. The number A^ of variables is taken in the range 
50 - 200. We take T' = (see figure |6|. Thus for A^ = 50, we have 6 = 167ns. The 

performance of the reservoir does not depend on the exact value of T' chosen, as long as 
T'/T is not a simple fraction (such as 1, 3/4, or 1/2), which would divide our reservoir into 
different independent subsystems. 



27 



Discretized dynamics 



To obtain the discretized dynamics, we discretise the intensity along the fiber according 

to 

Xi{n) X (nT' + (^i + 9^ i = 0, . . . , N - 1 = T' /9 (18) 

where we suppose that 9 = T' /N = T / {N + k) with k integer (see figure [6]). We thus have 
that the physical time t is related to n,i, k through 

Upon neglecting the effects of the filters J-'^^ and J-"^^ we obtain the discretized version of eq. 



(17). Note that the absence of synchronisation (T' ^ T, or equivalently k > 0) completely 
modifies the dynamics by coupling the discretised variables Xj to each other. Note also that 
the wrap arround effect (the second line in eq. (|5])) does not appear in traditional reservoir 
computing. 



Numerical simulations 



Two different numerical models were developed to study the capabilities of the network: 
a 'discretized' model, closer to the standard formulation of Echo State Networks, and a 
'continuous' model, which is as close as possible to our experimental apparatus. 

In the 'discretized' version of the model we implement the discretization described by eq. 
(§. No noise is considered, and the bandpass effects of the various components are neglected. 
The sine nonlinearity and the topology of the network are preserved. The optimal operating 
point of the system is found by tuning the parameters a and /3 in eq. ([s]). This model 
is used to set a performance goal for our experimental system: if the performance of the 
experiment is close to the one of the model, then our system is robust enough with respect to 
the noise and the effects introduced by each of its components. Moreover, the performance 
of the 'discretized' model is the same, within the experimental error, to the one of traditional 
networks as reported in ^4], allowing us to validate the chosen nonlinearity and topology 
as good choices for a reservoir computer. 

The 'continuous' Matlab model we developed is instead as close as possible to the experi- 
ment. All the signals in the simulation are discretized at 200 Msamples/s. This corresponds 
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to the sample rate of the arbitrary waveform generator and the digitizer. All the components 
are represented by their transfer function at their respective operating point (sine function 
for the Mach-Zehnder modulator, responsivity and transimpedance gains of the photodiodes, 
gain of the amplifier). The collective frequency response of all the components is represented 
by a bandpass filter with first order slopes. This is a reasonable approximation to the exact 
slope of the bandpass filters which was measured using a vector network analyzer in open 
loop configuration. Noise is added to the signal at each noisy element of the system (dark 
current of the photodiode, noise added by each amplifier...). 

The dynamics of the model correspond very closely to the experiment. This is illustrated 
for instance by the clear agreement of the simulated and measured bifurcation diagram (see 
fig. [5]) and by the clear agreement of the simulated and measured performances on the 
channel equalization task, see Fig. |4]in the main text. 

The continuous model allows us to easily explore the sensitivity to parameters, such 
as noise level or the shape of the nonlinearity, which can't always be reached with the 
experiment. 

Post processing 

The light intensity I{t) in the fiber loop is converted to a voltage by a photodiode and 
recorded by the digitizer operating at 200MS/s. From the intensity I{t) recorded during a 
time T' we extract discrete variable values Xj(n), i = 0, ...,N — 1. This is carried out as 
follows. The intensity I{t) is divided into pieces of duration 6. We neglect the first quarter 
and the last quarter of the data points recorded over the duration 6 and associate to Xi{n) 
the average of the remaining data points. This procedure in which the beginning and end of 
each interval 6 is omitted allows us to not be affected by the transients as the intensity goes 
from one value to the other, and also allows an efficient synchronization of our system. The 
estimator y{n) is then obtained by taking a linear combination y{n) = ^^^WiXiln) where 
the weights Wi are optimized. This post processing is carried out offline, on a computer. 
Fig. |6] shows an example of the input sent to the reservoir, the reservoir response and the 
discretization operated on the reservoir output. 
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Tasks 



In our work we considered several tasks. We review them in detail. 
Signal classification 

This is a simple task that we use for a first evaluation of the performance of the reser- 
voir. The input u{n) to the system consists of random sequences of sine and square waves 
discretized into 12 points per period. The mask values bi are drawn from the uniform distri- 
bution over the interval [0, +1]. The reservoir size is taken to be = 50. The output y{n) 
should be 1 when the signal is the square wave, and when it is the sine. The weights Wi 
are obtained by minimizing the MSE between y{n) and the ideal output. Experimentally 
we obtain NMSE ~ 1.5 10^^ , which corresponds to essentially perfect operation for this 
task. These experimental results are in close agreement with those obtained using numerical 
simulations of our reservoir. 

For comparison, practically the same task was studied previously in [17j. An error rate 
(percentage of time the signal was misclassified) of 2.5% was obtained. 

Nonlinear channel equalization 

The task is to reconstruct the input d{n) G {—3, —1, +1, +3} of a noisy nonlinear wireless 
communication channel, given the output u{n) of the channel. The relation between d{n) 
and u{n) is 

q(n) = 0.08d{n + 2)-0.12d{n + 1) + d{n) + 0.l8d{n-l) 

-0.1d{n-2)) + 0.091d(n-3)-0.05rf(n-4) (20) 
+0.04ci(n-5) + 0.03c/(n-6) + 0.01d(n-7) 

u{n) = q{n) + 0.036g(n)^-0.011g(n)^ + u{n) (21) 

where z/(n) represents i.i.d. Gaussian noise with zero mean, adjusted in power to yield 
signal-to- noise ratios ranging from 12 to 32 dB. This task was introduced in |25j and it was 
shown in [4j that reservoir computing could significantly improve performance on this task. 
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In our study, the input mask is taken to be uniformly distributed over the interval 
[— 1,+1]. The reservoir size is taken to be = 50. To perform the task, we first ob- 
tain an estimator y{n) of the input by minimizing the MSE between y{n) and d{n). We 
then obtain an estimator d{n) by replacing y{n) by the discretized value {—3, —1, +1, +3} 
to which it is closest. Finally we estimate the Symbol Error Rate (SER), i.e., the fraction 
of time that d{n) differs from d{n). The performance of the network is calculated as the 
average performance over 10 different input sequences, in which the first 3000 samples form 
the training set and the following 6000 samples form the test set. The SER on the test set is 
then studied as a function of Signal to Noise Ratio at the input. The values reported in figure 
|4]in the main text are the average SER for 10 different trials, while the error bars represent 
the standard deviation of the SER for the same trials. It should be noted that 6000 test 
steps are the maximum number of steps that the arbitrary waveform generator in our setup 
allows. Hence, when SERs approach 10"'^, our average SERs include trials where two errors, 
one error, or no error at all have been made by the reservoir. This means that the error 
bars on the experimental data where SERs are close to 10"'* might be overestimated. In 
contrast, data from simulations do not suffer from this effect, as we can arbitrarily increase 
the number of test samples for a more precise measurement. 

For comparison, at a SNR ratio of 28dB, the three models studied in |25) gave SER of 
2 ■ 10-^4 ■ 10-^1.5 ■ 10"^ while the reservoir studied in g] gave SERs of 1 ■ lO"'' to 1 ■ 10"^ 
At the same SNR of 28dB, our experimental system gives a SER of 1.3 ■ 10"**. 



NARMAIO 



In this task the aim is to reproduce the behavior of a nonlinear, tenth-order system with 
random input drawn from a uniform distribution over the interval [0,0.5]. The equation 
defining the target system is given by 

y{n + 1) = 0.3y{n) + 0.05y(n) y{n - i) j + l.^u{n - 9)u{n) + 0.1 (22) 

For this task, the mask is uniformly distributed over [0, +1] and the reservoir size is 
A^ = 50. We first train our system using 1000 time steps, then we test the system on a new 
sequence of 1000 inputs. The performance of the reservoir is measured by the Normalized 
Mean Square Error of the estimator y{n), averaged over 10 different pairs of train and 
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test sequences. For this NARMAlO task, the best performances (NMSE = 0.16 for the 
discretized simulation, 0.168 for the continuous simulation and 0.167 for the experiment ) 
are obtained in a very linear regime. 

Isolated spoken digit recognition 

For the isolated spoken digit recognition task, the data is taken from the NIST Tl-46 
corpus [30]. It consists of ten spoken digits (0...9), each one recorded ten times by five 
different female speakers. These 500 spoken words are sampled at 12.5 kHz. This spoken 
digit recording is preprocessed using the Lyon cochlear ear model |31]. The input to the 
reservoir Uj{n) consists of an 86-dimensional state vector (j = 1, 86) with up to 130 time 
steps. The number of variables is taken to be = 200. The input mask is taken to be 
a X 86 dimensional matrix bij with elements taken from the the set {—0.1, +0.1} with 
equal probabilities. The product ^jhijUj{n) of the mask with the input is used to drive 
the reservoir. Ten linear classifiers yk{n) {k = 0, ...,9) are trained, each one associated to 
one digit. The target function for yk{n) is +1 if the spoken digit is k, and -1 otherwise. 
The classifiers are averaged in time, and a winner-takes-all approach is applied to select the 
actual digit. 

In our study, the 500 spoken words are divided in five subsets. We trained the reservoir 
on four of the subsets, and then tested it on the fifth one. This is repeated five times in order 
to use each subset as test part. In this way we can test our system over all the speakers 
and digits, and compute an average performance . The performance is given in terms of the 
Word Error Rate, that is the fraction of digits that are misclassified. We obtain a WER of 
0.4% which correspond to 2 errors in 500 recognized digits. 

For comparison, in [26], where Reservoir Computing was first used on this spoken digit 
benchmark, a WER of 4.3% was reported for a reservoir of size 1232. In [12j a WER of 0.2% 
was obtained for a reservoir of size 308 and using the winner-takes-all approach. In [19] a 
WER of 1.3% for reservoirs of size 200 is reported. The experimental reservoir reported in 
|18| gives a WER of 0.2% using a reservoir of size 400. The Sphinx-4 system [27J uses a 
completely different method based on Hidden Markov Models and achieves a WER of 0.55% 
on the same data set. 
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Figure 4: Results for nonlinear channel equalization. The horizontal axis is the Signal to Noise Ratio 
(SNR) of the channel. The vertical axis is the Symbol Error Rate (SER), that is the fraction of 
input symbols that are misclassified. Results are plotted for the experimental setup (black circles), 
the discrete simulations based on eq. (|5| (blue rhombs), and the continuous simulations that take 
into account noise and bandpass filters in the experiment (red squares). All three sets of results 
agree within the statistical error bars. Error bars on the experimental points relative to 24, 28 and 
32 dB might be overestimated (see Supplementary Material 2). The results are practically identical 
to those obtained using a digital reservoir in 
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Figure 5: Simulated (left panel) and measured (right panel) bifurcation diagram. The vertical axis 
(output states) is proportional to the optical intensity in the optical fiber for the simulations, and to 
the voltage at the output of the readout photodiode for the measurements (this voltage is equal to 
de optical intensity multiplied by the gain of the amplified photodiode). The gray scale represents 
the histogram of optical intensities inside the system as a function of the feedback gain. When the 
feedback gain is lower than unity, only one value of the light intensity is possible. For feedback gain 
slightly larger than unity, the light intensity oscillates between two values. For even larger feedback 
gain (around 2), the nonbijective nature of the Mach-Zehnder modulator's transfer function leads 
to oscillation between multiple light intensity levels or even to a chaotic behavior. The number of 
bifurcations before reaching the chaotic behavior is determined by the amount of noise inside the 
system. The thickness of the branches in the measured bifurcation diagram is due to the noise 
added by the oscilloscope. 
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Figure 6: Operation of the experimental reservoir. The blue line represents the masked input 
driving the reservoir; the red line represents the reservoir response to the depicted input as measured 
by the oscilloscope. Both have been normalized so that their values lie between -1 and 1 for the 
whole length of the experiment. The white squares along the red line represent the discretized 
values of the reservoir states, obtained by averaging the reservoir output on a time interval of 
amplitude 0/2. The upper part of the figure shows the reservoir operation for the first 8 inputs of 
the reservoir; note that the input is zero up to t « 0.8 • 10~^. The bottom left panel shows a detail 
from the measurement on the response to the very first input value, when no feedback is present 
yet, and we only see the instantaneous response of the Mach-Zehnder. The panel on the bottom 
right depicts the response of the system to the 5th input, where the effect of the feedback is clearly 
visible: the amplitudes of the reservoir states are no longer just related to the instantaneous input, 
but are influenced by previous inputs as well. 
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